Thermal heterogeneity is an important factor for maintaining the genetic differentiation pattern of the pelagic barnacle Lepas anatifera in the northwest Pacific

Abstract Macrobenthic invertebrates are ubiquitously distributed in the epipelagic zone of the open ocean. Yet, our understanding of their genetic structure patterns remains poorly understood. Investigating the genetic differentiation patterns of pelagic Lepas anatifera and clarifying the potential roles of temperature maintaining this pattern are crucial for our understanding of the distribution and biodiversity of pelagic macrobenthos. In the present study, mitochondrial cytochrome oxidase subunit I (mtDNA COI) from three South China Sea (SCS) populations and six Kuroshio Extension (KE) region populations of L. anatifera sampled from fixed buoys and genome‐wide SNPs from a subset of populations (two SCS populations and four KE region populations) were sequenced and analyzed for investigating the genetic pattern of the pelagic barnacle. Water temperature was different among sampling sites; in other words, the water temperature decreased with latitude increases, and the water temperature on the surface was higher than in the subsurface. Our result showed that three lineages with clear genetic differentiation were found in different geographical locations and depths based on mtDNA COI, all SNPs, neutral SNPs, and outlier SNPs. Lineage 1 and lineage 2 were dominant in the subsurface populations and surface populations from the KE region, respectively. Lineage 3 was dominant in the SCS populations. Historical events during the Pliocene epoch shaped the differentiation of the three lineages, while, nowadays, temperature heterogeneity maintains the current genetic pattern of L. anatifera in the northwest Pacific. The subsurface populations were genetically isolated from the surface populations in the Kuroshio Extension (KE) region, implying small‐scale vertical thermal heterogeneity was also an important factor maintaining the genetic differentiation pattern of the pelagic species.


| INTRODUC TI ON
Macrobenthic invertebrates are ubiquitously distributed in the epipelagic zone of the open ocean (Carlton et al., 2017), while we are rarely understanding their phylogeographic patterns there.
Phylogeographic studies provide critical insights into the history of population divergence (Avise, 2000) and further improve our understanding of how environmental change impacted the distribution and diversity of the species (Zhang et al., 2019). Especially in the open ocean, where numerous species are highly sensitive to global warming, exploring the role of temperature shaping the phylogeographic patterns can enhance our understanding of evolutionary processes and our ability to predict species' responses to environmental change (Pinsky et al., 2019). Macrobenthic invertebrates, due to their wide distribution, long residency in specific habitats, and high sensitivity to changes in various environmental conditions (Mohan et al., 2013), are ideal species to investigate their phylogeographic patterns and clarify the potential drivers.
In the open sea, environmental gradients play an important role in genetic divergence (Hudson et al., 2017); for example, large-scale temperature heterogeneity can lead to local adaptation and genetic differences (Saeedi et al., 2019;Sasaki & Dam, 2019), implying that genetic divergence at depth gradients contains important information for the understanding of phylogenetic relationships and evolutionary history (Behrens et al., 2021;Stefánsson et al., 2009). Therefore, phylogeographic studies conducted in both vertical and horizontal directions might deepen our understanding of the genetic divergence or species evolutionary history of marine species.
The goose barnacles of the genus Lepas were considered ideal candidates to investigate population differentiation in the open ocean due to their cosmopolitan distribution (Schiffer & Herbig, 2016). L. anatifera is the goose barnacle with the highest abundance worldwide (Young, 1990). They have been reported from various floating objects including plastics (Aliani & Molcard, 2003), pumice stones (Coombs & Landis, 1966), woods (Nilsson-Cantell, 1930), and drifting kelp rafts (Hobday, 2000). Their planktonic larval stage can take up to 2 months (Anderson, 1994;Darwin, 1851;Hinojosa et al., 2006), thus enabling long-distance dispersal. A previous study of L. anatifera throughout its wide distribution range provided a global view of its genetic differentiation pattern, revealing three major phylogenetic groups (Schiffer & Herbig, 2016).
In the Northwest Pacific (NWP), there are complex oceanographic current systems, including the Kuroshio Current (KC), Kuroshio Extension Current (KEC), and Oyashio Current (OC). The northward Kuroshio transported a large amount of heat from low latitudes to middle and high latitudes in a very narrow area, and the southward hydrophile transported low temperature and low salt water from high latitudes to low latitudes, resulting in a significant ocean temperature gradient in the Kuroshio Extension region (Qiu, 2001). Under the context of global warming, the Kuroshio is being intensified (Chen et al., 2019), and clarifying the changes in biodiversity and distribution of marine species affected by the Kuroshio will provide us with a better understanding of the impact of climate change on marine species. To identify and quantify the dynamic and thermodynamic processes governing the variability and the interaction between the Kuroshio Extension and the recirculation gyre, the inverted echo sounders with bottom pressure gauges and current meters, subsurface moorings, surface buoys, and dozens of Argo profiling floats have been set in the Kuroshio Extension region (Zhu et al., 2021). Lepas anatifera has been ubiquitously found on the surfaces of these buoys and subsurface moorings.
The pelagic barnacles L. anatifera on these moorings and buoys are important for studying the genetic diversity and the factors maintaining the genetic differentiation pattern in the NWP. In the present study, we focus on the genetic differentiation patterns of L. anatifera sampled vertically and horizontally from these buoys and subsurface moorings in both the South China Sea (SCS) and KEC. We aimed to investigate the following questions: (1) whether there is genetic divergence existing among NWP populations for L. anatifera; (2) if there is, what is the role of temperature driving the formation and maintenance of the corresponding pattern? Our results provide new insights into the mechanisms of the formation and maintenance of species' genetic patterns in the NWP. Moreover, our research approach is of general relevance for future studies on phylogeographic patterns and genetic adaptation of species distributed in the epipelagic zone of the open oceans.

| Sampling and sequencing
A total of 186 L. anatifera were collected from three sites in the South China Sea (SCS) and six sites in the Kuroshio Extension (KE) region ( Figure 1a, Table 1). Among all these sites, samples in KE-1, KE-2, and SCS-1 were sampled in the subsurface moorings (water depth > 100 m), and all the other sites were sampled in the fixed surface buoys (water depth < 50 m) (Figure 1b).
For sequencing mitochondrial cytochrome oxidase subunit 1 (COI), the total genomic DNA of the 186 L. anatifera samples was extracted from muscles either from the peduncle or the base of the cirri using a modified cetyltrimethylammonium bromide (CTAB) method

T A X O N O M Y C L A S S I F I C A T I O N
Biogeography (Ding et al., 2018). The universal primers LCO1490 and HCO219879 (Folmer et al., 1994) and the primers jgHCO2198 and jgLCO1490 (Geller et al., 2013) were used for polymerase chain reaction (PCR).
The forward and reverse sequences were assembled using SeqMan application of DNAStar software (SeqMan NGen®. Version 17.2. DNASTAR.). Multiple sequence alignment was then conducted using the plug-in Clustal W program in MEGA X (Kumar et al., 2018) with default parameter settings.   (Chen et al., 2018). The genome of L. anserifera (Cirripedia, Lepadidae) (Ip et al., 2021) was used as a reference genome and clean data were aligned to the genome with BWA (Li & Durbin, 2010). SNP calling was conducted using Samtools (Danecek et al., 2021) and the outputs were put into gstacks program (−-rm-unpaired-reads) in Stacks to build loci and identify SNPs by incorporating paired-end reads. The populations module (−r 0.6, −-filter-haplotype-wise) was used to filter the data. We retained only those loci that were genotyped in at least 60% of samples in a population and unshared SNPs were pruned to reduce haplotype-wise missing data. VCFtools (Danecek et al., 2011) was used to further filter the SNPs with the following settings: mini- and inclusion of only bi-allelic sites (−-min-alleles 2 --max-alleles 2).

| Genetic diversity and phylogenetic analyses
Molecular diversity for COI was estimated by the number of haplotypes (n), haplotype diversity (h), and nucleotide diversity (π c ) for each population using DnaSP 6.0 (Rozas et al., 2017

| Historical dynamics analysis
For COI, the neutrality test of Tajima's D (Tajima, 1989) and Fu's F S (Fu & Li, 1993) and mismatch distribution were calculated using Arlequin 3.1 (Excoffier et al., 2005). The mutation rate of COI estimated for crustacean species ranges from 1.4% to 2.33% per million years (Knowlton & Weigt, 1998;Schubart et al., 1998) and was used to estimate divergence time between lineages according to the K2P genetic distance calculated in MEGA X (Kumar et al., 2018). Besides, we used 1.865% as the average substitution rate to estimate the expansion time of each lineage based on the formula τ = 2μkt (Slatkin & Hudson, 1991), where τ is a population expansion parameter in mismatch distribution, μ is the average nucleotide substitution rate, and k is the numbers of nucleotides.

| Detection of genetic differentiation associated with temperature
To detect loci underlying local adaptation, a Bayesian approach as implemented in BAYENV2 (Coop et al., 2010;Günther & Coop, 2013) was applied to the entire SNPs dataset. The Bayesian approach considers the effect of population structure, using a covariance matrix based on neutral SNPs to control for demographic effects when testing for correlations between environmental factors and genetic differentiation (Coop et al., 2010). To identify the neutral loci, we first excluded the candidate loci under natural selection detected in BayeScan 2.1 (Foll & Gaggiotti, 2008).
BayeScan decomposes locus-population F ST coefficients into a population-specific component (beta), shared by all loci and a locusspecific component (alpha) shared by all the populations using a logistic regression (Foll & Gaggiotti, 2008). Loci with a positive value of alpha were considered to be under divergent selection. Prior odds of 100 were used for identifying the candidates of the selected loci, and then BayeScan was run with the settings of 100,000 iterations, a thinning interval of 10, 20 pilot runs of 5000 iterations each with a burn-in length of 50,000. Loci with a false discovery rate (FDR) of 0.05 were considered under selection. Then, we retained the SNPs within Hardy-Weinberg equilibrium (HWE) at p < .005 and further pruned the SNPs for linkage disequilibrium (LD) using PLINK (Purcell et al., 2007) at a thread of 0.2. The resulting SNPs were determined as "neutral SNPs." To ensure that the estimated matrix was convergent, four independent runs with different random seeds were run based on neutral loci. The average water temperatures in January (T1) and August (T8) were selected as the cold and warm months, respectively. As environmental factors that may constrain the survival and reproduction of L. anatifera (Patel, 1959), these temperature variables were tested for association with genetic variation.
Each environmental parameter was standardized by subtracting the mean and dividing it by the standard deviation of the parameter across all sampling sites. To reduce stochastic errors, the detection of environmental correlation was also run three times independently with different random seeds. SNPs with log 10 Bayes factor (BF) >1.5 (Jeffreys, 1961) in the results were identified as loci strongly associated with environmental variables. Candidate genes containing selected SNPs were functionally annotated with the ANNOVAR software (Wang et al., 2010).
To estimate the degree to which the genomic variations among different populations can be explained by temperature heterogeneity and to test the reliability of temperature-related outliers detected using BAYENV2, redundancy analysis (RDA) was performed based on adaptive data sets associated with temperature using the RDA function from the vegan package (Oksanen et al., 2013). Analyses of variance (ANOVAs) were performed to check the RDA model for significance and marginal ANOVAs (999 permutations) were run to determine whether the two temperature variables were significantly correlated with genetic variations.

| Population structure
Genetic differentiation between populations was represented by pairwise F ST values (Weir & Cockerham, 1984), which were estimated based on COI data using Arlequin 3.1 (Excoffier et al., 2005) with 1000 permutations performed to test for significance.
Isolation by distance (IBD) model is often used to test whether the migration of populations conforms to the stepping-stone model (Bohonak, 2002). If the population conforms to the IBD model, the genetic distance between populations [F ST /(1-F ST )] is linearly correlated with geographic distance. The correlation between geographic distance and genetic distance was detected using the Mantel test in GenAlEx 6.5 (Peakall & Smouse, 2012) after the geographic distance (GGD) and genetic distance matrix (GD) of COI and neutral SNPs were generated.
Population structure and individual ancestry were predicted based on all SNPs, neutral SNPs, and temperature-related outliers, respectively, using ADMIXTURE v.1.3.0 (Alexander et al., 2009), which implemented a maximum likelihood (ML) estimation method.
The number of genetic groups represented by K values was predefined from one to nine. The optimal K values were assessed using  Table 2).
The results of IBD revealed no linear relationship between geographic distance and genetic distance for L. anatifera populations (COI: R 2 = 0.0697, p = .061) ( Figure S1).

| Historical demography
The mean net genetic distances (±SE) among the three lineages were shown as follows: 0.185 (0.045) between lineage 1 and lineage 3,  (Table 3). In the sequential mismatch distributions, all three lineages displayed a unimodal distribution. BSP analysis also indicated that three lineages had experienced demographic increases ( Figure S2).

| Selection for temperature-associated outlier SNPs
Raw The BAYENV analysis based on all 15,238 SNPs revealed that outlier loci could be screened based on both temperature variables (T1 and T8) (Figure 4a,b). With the criterion of log 10 Bayes factor (BF) greater than 1.5, 226 candidate SNPs were identified as "outlier SNPs" associated with local temperature variables. Among these candidate SNPs, 213 and 59 SNPs were strongly correlated with the average temperature in January (T1) and August (T8), respectively, and 46 SNPs were correlated to both temperature variables. The 226 SNPs significantly associated with temperature variables were also identified as outliers by BayeScan.
Correlation coefficients between T1 and T8 was 0.89, and the variance inflation factor (VIF) values for the two predictors were below 5, indicating that multicollinearity between the two predictors should not be a problem for the model. For the RDA analysis based on the 226 SNPs associated with local temperature, the RDA model was globally significant (p = .001) with an adjusted coefficient of determination (R adj 2 ) of 0.487, with T1 and T8 accounting for 55.36% and 44.72% of the total explained variance, respectively. The first two axes of the RDA accounted for 99.51% and 0.49% of the genetic variations, respectively (Figure 4c). The results of marginal ANOVAs for the RDA indicated that T1 and T8 were significant predictors of the putatively adaptive genetic variation (p = .001).
Among the 226 temperature-related "outlier SNPs," a total of 19 SNPs were annotated to five genes (  (Baltimore, 1970). The 60 S ribosomal protein L31 coding by the gene containing SNP_127,165:96 was predicted to be a structural constituent of the ribosome and involved in cytoplasmic translation (UniProtKB ID: A0A6A4WLN2).
More details of the putative functions of these proteins can be found in Table 4.

| Genetic differentiation pattern of the pelagic barnacle Lepas anatifera in the Northwest Pacific
The goose barnacle Lepas anatifera exhibits high abundance worldwide (Young, 1990). In the present study, phylogenetic trees, pair- what has been seen in other barnacles (Chan et al., 2007a(Chan et al., , 2007bChang et al., 2017;Tsang et al., 2012). For example, the currentdriven northward range extension of the intertidal acorn barnacle Hexechamaesipho pilsbryi (Tsang et al., 2013).
The horizontal population differentiation or vertical differentiation of macrobenthos has been reported in previous studies (Bik et al., 2010;Ni et al., 2017;Pardo-Gandarillas et al., 2018;Sun et al., 2017Sun et al., , 2022Wang et al., 2022). However, studies of intraspecific population differentiation at both horizontal and vertical scales for epipelagic macrobenthos were still limited. In a previous study by Schiffer and Herbig (2016), the phylogeny of L. anatifera based on samples collected globally was investigated and three major groups of L. anatifera were revealed, inferring the allopatric differentiation of L. anatifera (Schiffer & Herbig, 2016). The NJ tree based on COI data showed that lineage 3 of the South China Sea identified in the present study corresponded to the global group from the study of Schiffer and Herbig (2016), suggesting the global distribution of lineage 3. There were no sequences downloaded from NCBI clustered with the lineage 1 and lineage 2 of the KE region, probably because the downloaded sequences were not comprehensive enough, and for the subsurface lineage 1, it was also likely due to the limitation of no subsurface sampling of the previous study. In the present study, the genetic distance between the surface lineage 2 of the KE region and lineage 3 of the South China Sea was smaller than that between lineage 1 and lineage 2, indicating higher genetic similarity despite geographical distance. The larger genetic differentiation between different vertical sites possibly was related to the different adaptability to the specific microhabitats (Han & Dong, 2020;Li, Tan, et al., 2021) and also implied the importance of small-scale environmental heterogeneity in maintaining the genetic differentiation pattern. Due to the limitations of the sampling method (i.e., no fixed buoys or mooring system in the same station), in the present study, we cannot get samples from different water depths in the same station, which limits us from analyzing vertical differentiation in the same region. Hence, more samples from different vertical microhabitats should be collected for analysis in the future.
The barnacles L. anatifera in different populations face divergent thermal environments. This species usually thrives in tropical and subtropical waters where sea temperatures exceed 18-20°C (Hinojosa et al., 2006;Patel, 1959), and has been thought to survive in waters warmer than 15°C (Patel, 1959). The water temperatures in the SCS (0 m to 100 m in depth) are higher than 20°C in all seasons (Liu & Gan, 2017). In the surface sites in the KE region, temperatures are above 15°C all year round except for the KE-3 where temperatures can be less than 15°C in the colder months. However, as for the subsurface sites in the KE region (i.e., KE-1 and KE-2), the water temperatures are below 15°C throughout the year (Wang et al., 2020).
The low water temperature for the subsurface KE population (below F I G U R E 6 Admixture structure of SNP-based co-ancestries for Lepas anatifera samples (K = 3) using (a) all SNPs, (b) neutral SNPs, and (c) outlier SNPs.

Lineage1
Lineage2 Lineage3 anatifera from the fixed buoys in the NWP.
The redundancy analysis based on the genotype-environment association method further confirmed the roles of temperature heterogeneity in maintaining the genetic differentiation of L. anatifera in the northwest Pacific. As previous studies reported (Han & Dong, 2020;Hu & Dong, 2022b), the genetic variations among populations of the mussel Mytilus galloprovincialis and the oyster Crassostrea sikamea were significantly correlated with the environmental temperatures.
In the present study, the results of RDA analysis revealed that local temperature, especially the low temperature in January (e.g., T1) was a significant factor influencing the distribution patterns of putatively adaptive genetic variation among populations. Among the 19 candidate adaptive loci, cuticle protein 6 possibly plays an important role in enhancing stress resistance in the cuticle of L. anatifera (Li & Denlinger, 2009), and in the adaptation to local thermal environments.
Our results also indicated the potential possibility of hybridization and genetic introgression between different lineages. Both the SNPs-based NJ tree and PCA plot indicated that the populations in the SCS were genealogically homogeneous, but there were mixed lineages present in the surface populations of the KE region. Particularly in the KE-6 population, there was a sole population with a mixed distribution of three lineages. Furthermore, three individuals in the KE-6 population showed an exceptional trend: their GBS-seq belonged to lineage 1 or lineage 3, but their COI sequences were all grouped into lineage 2. Such topological inconsistency had been found in other species, such as the Japanese mantis shrimp Oratosquilla oratoria (Cheng & Sha, 2017) and the pantropical green seaweed genus Halimeda (Verbruggen et al., 2005).
All these results have shown that secondary contact between highdiverged lineages can cause hybridization and genetic introgression.
Under the context of continuous global warming, the introgression from high-temperature-adapted lineage may bring genotypes that allow individuals adapted to lower temperatures to better adapt to the warming ocean (Hu & Dong, 2022a;Prada & Hellberg, 2021).
Furthermore, incomplete lineage sorting could also lead to polytomies in the phylograms and the inconsistent lineage between mtCOI and nuclear genes (Cheang et al., 2010). Whether such a situation occurred between lineages of L. anatifera remains to be further tested.

| The critical role of fixed substrate in investigating population connectivity in the open ocean
The placement of moorings and buoys in the KE region provides an ideal system for our studies of L. anatifera genetic differentiation patterns. The necessity of conducting the study either of freedrifting debris in situ or the use of moorings as proxies had been mentioned in a recent study (Mesaglio et al., 2021). In the present study, L. anatifera specimens were sampled from fixed surface buoys and subsurface moorings placed in the northwest Pacific, allowing us to observe the divergence in the vertical direction. These artificial structures are helpful for our investigation of the distribution pattern and the corresponding formation and maintenance mechanisms of sessile macrobenthos in the open oceans.
Goose barnacles in the genus Lepas (family Lepadidae) are the most abundant and widespread biofouling taxa globally (Thiel & Gutow, 2005), with a planktonic larval stage of up to 2 months (Anderson, 1994;Darwin, 1851;Hinojosa et al., 2006). However, there were limited reports of goose barnacles distributed in deeper water layers. Due to the limitations of sample collection, most previous studies have defined the goose barnacles of Lepas as epipelagic rafters (Schiffer & Herbig, 2016). So far, most studies focused on the effects of floating objects on the long-distance dispersal of the species and applied SST as an indicator for genetic differentiation. The present study emphasized the existence of genetic differentiation in the vertical direction and raised the importance of the vertical temperature profile as the maintaining force of genetic differentiation in the ocean. Therefore, micro-geographic adaptation in different spatial scales should be valued to understand how species respond to their microclimates (Bates et al., 2018;Richardson et al., 2014).

ACK N OWLED G M ENTS
This work was supported by the National Natural Science China for assistance in sample collection. We thank Tao Wu for providing the in situ photograph of Lepas anatifera attached to the buoys and moorings.

CO N FLI C T O F I NTE R E S T S TATE M E NT
The authors have no conflict of interest to declare.

DATA AVA I L A B I L I T Y S TAT E M E N T
The genetic data generated during the present study are available from the NCBI Sequence Read Archive (www.ncbi.nlm.nih.gov/ genbank, BioProject ID: PRJNA875021; BioSample accessions: SAMN30608385-SAMN30608500. GenBank accessions of mtDNA COI: OP215356-OP215545).